Code
library(tidyverse)
library(here)
library(rio)Quarto provides a unified authoring framework for data science, combining your code, its results, and your prose. Quarto documents are fully reproducible and support dozens of output formats, like PDFs, Word files, presentations, and more.
Quarto files are designed to be used in three ways:
Quarto is a command line interface tool, not an R package. This means that help is, by-and-large, not available through ?. Instead, as you work through this chapter, and use Quarto in the future, you should refer to the Quarto documentation (https://quarto.org/).
Note
Quarto documents are fully reproducible and support dozens of output formats, like PDFs, Word files, slideshows, and more.
Need some help?
Download Quarto: https://quarto.org/docs/get-started/
Quarto Guide: https://quarto.org/docs/guide/
Markdown Reference Sheet: Help > Markdown Quick Reference
You’ll need the Quarto Command Line Interface but it is automatically done by RStudio for you.
Let us create one from RStudio now.
To create a new Quarto document (.qmd), select File -> New File -> Quarto Document in RStudio, then choose the file type you want to create. For now we will focus on a .html Document, which can be easily converted to other file types later.
Go ahead and give a title.
The newly created .qmd file comes with basic instructions, let us go through it now.
It contains three important types of content:
---```## headings and simple text.YAML stands for yet another markup language or YAML ain’t markup language (a recursive acronym), which emphasizes that YAML is for data, not documents.
In any case, it holds the metadata of the document and can be really helpful.
When you render a Quarto document, first knitr executes all of the code chunks and creates a new markdown (.md) document, which includes the code and its output. The markdown file generated is then processed by pandoc, which creates the finished format. The Render button encapsulates these actions and executes them in the right order for you.
Learn more about Markdown from the Guide: https://quarto.org/docs/authoring/markdown-basics.html
When you open an .qmd, you get a notebook interface where code and output are interleaved. You can run each code chunk by clicking the Run icon (it looks like a play button at the top of the chunk), or by pressing Ctrl + Shift + Enter.
RStudio executes the code and displays the results inline with the code by default. However, you can change it to display in the console instead by clicking on the gear icon and changing the Chunk Output in Console option.
You can render the entire document with a single click of a button.
Go ahead and give it a try. RStudio might prompt you to save the document first, save it in your working directory by giving it a suitable title.
You should now see some output like this:
The knitr package extends the basic markdown syntax to include chunks of executable R code.
When you render the report, knitr will run the code and add the results to the output file. You can have the output display just the code, just the results, or both.
To embed a chunk of R code into your report, surround the code with two lines that each contain three back ticks. After the first set of backticks, include {r}, which alerts knitr that you have included a chunk of R code. The result will look like this:
To omit the results from your final report (and not run the code) add the argument eval = FALSE inside the brackets and after r. This will place a copy of your code into the report.
To omit the code from the final report (while including the results) add the argument echo = FALSE. This is very handy for adding plots to a report, since you usually do not want to see the code that generates the plot.
Read more about R Code Chunks at https://rmarkdown.rstudio.com/articles_intro.html. You can also change this from the gear icon on the right of the code chunk
You can also evaluate R expressions inline by enclosing the expression within a single back-tick qualified with r.
knitr will replace the inline code with its result in your final document (inline code is always replaced by its result). The result will appear as if it were part of the original text. For example, the snippet above will appear like this:
Now let us try building our own .qmd document and add our own analysis. Let us use a new dataset for this purpose. So go ahead and delete everything below the YAML header.
The data we are going to use today is the data of deaths due to COVID-19 in Kerala state. This information is available from the Government of Kerala COVID-19 Dashboard https://dashboard.kerala.gov.in/covid/
Lets begin!
Workflow with Quarto
Create a new project and open a new .qmd file in the project.
Load Packages
Load the Data
Check the dimensions of the data
You can alternatively use nrow() and ncol().
Now try to use them in the R inline code.
Hint: Use `r ` for inline code chunk like we discussed earlier. Inline R code chunks can be very useful when you are working with data.
Text in Quarto:
Output:
There are 21820 rows in the data and 9 columns.
Check the variable names and clean them
A good practice is to first check all the variable names and clean them using the clean_names() function from the janitor package
[1] "SL No." "Date Reported"
[3] "District" "Name"
[5] "Place" "Age"
[7] "Sex" "Date of death"
[9] "History(Traveler / contact)"
Look at the difference in the names() of the dataset once it has been cleaned by janitor
[1] "sl_no" "date_reported"
[3] "district" "name"
[5] "place" "age"
[7] "sex" "date_of_death"
[9] "history_traveler_contact"
The skim() function shows that date_reported, date_death, and sex are character variables which might not be ideal. Let us transform them into the data types date and factor. also that history_traveler_contact are mostly NA.
Let us drop the column history_traveler_contact, name and place from our analysis
Lets check the class of date_reported.
Lets do some more cleaning of the variables
When working with dates, the lubridate package is ideal.
Lets check the class of date_reported now
pull() is an excellent funtion that lets you pull a single varible from a dataset and perform operations. Read more about pull() in the Help menu.
Let us now look at the sex variable.
After some mutate() magic…
[1] Male Female Others - gf <NA>
Levels: - Female gf Male Others
We can pipe multiple mutate() functions too..
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `date_of_death = lubridate::dmy(date_of_death)`.
Caused by warning:
! 47 failed to parse.
Let us drop_na() for now
Let us look at the number of rows now
Let us look at the Districts
[1] "Thiruvananthapuram" "Kollam" "Pathanamthitta"
[4] "Kottayam" "Idukki" "Ernakulam"
[7] "Thrissur" "Palakkad" "Malappuram"
[10] "Kozhikode" "Wayanad" "Kasaragod"
[13] "Alappuzha" "Kannur" "Thiruvananthapura m"
[16] "THIRUVANANTHAPURAM" "KOLLAM" "PATHANAMTHITTA"
[19] "ALAPPUZHA" "KOTTAYAM" "IDUKKI"
[22] "ERNAKULAM" "THRISSUR" "PALAKKAD"
[25] "MALAPPURAM" "KOZHIKODE" "WAYANAD"
[28] "KANNUR" "KASARAGOD" "Kasargod"
[31] "Thiruvananthapuram?K" "Alappuzha?Kannur" "Kollam?Thiruvanantha"
[34] "Malappuaram" "Kozhikode?Thiruvanan" "Kozhikode?Ernakulam"
[37] "Eranakulam"
Let us clean it
mortality_df <- mortality_df |>
mutate(district = str_to_sentence(district)) |>
mutate(district = fct_collapse(district,
Thiruvananthapuram = c(
"Thiruvananthapura m",
"Thiruvananthapuram?K"))) |>
mutate(district = fct_collapse(district,
Kollam = c(
"Kollam?Thiruvanantha"))) |>
mutate(district = fct_collapse(district,
Ernakulam = c(
"Eranakulam"))) |>
mutate(district = fct_collapse(district,
Kasaragod = c(
"Kasargod"))) |>
mutate(district = fct_collapse(district,
Kozhikode = c(
"Kozhikode?Ernakulam",
"Kozhikode?Thiruvanan"))) |>
mutate(district = fct_collapse(district,
Alappuzha = c(
"Alappuzha?Kannur"))) |>
mutate(district = fct_collapse(district,
Malappuram = c(
"Malappuaram"))) Let us look at the number of Districts now
Let us now create a new variable called wave. This will tell us if the death has happened in the first wave or second wave of COVID-19.
For the workshop’s sake, let us consider April, 2021 as the cut off date for the first wave and second waves of COVID-19 in Kerala.
Let us create age_group variable
Distribution of Age and Gender
Lets look at the distribution age and sex among COVID-19 deaths in Kerala
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 59 68 67 76 121
Male Female
12777 8832
# A tibble: 2 × 3
sex `mean(age)` `sd(age)`
<fct> <dbl> <dbl>
1 Male 66.3 13.6
2 Female 68.0 14.2
Using gtsummary
Warning: package 'gtsummary' was built under R version 4.4.3
Using the inline R code you can:
Output:
The median (IQR) age (in years) among males and females are 68 (58, 75) and 70 (60, 78) , respectively.
Visualize using ggplot2
Distribution of Age groups and Waves
Using the inline R code you can:
Output:
The number of deaths in the First wave and Second wave of COVID-19 are 1,079 (24%) and 12,467 (73%) , respectively.
Visualize using ggplot2
Lets make more sense from this plot with some mutate() magic again..
Now let us render this!